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1. INTRODUCTION 

The concern for water quality (WQ) is quite essential for health, water resources and environmental 
purposes [1]. The demand by billions of individuals for clean, safe and adequate freshwater on the planet 
enticed the practioners and research communities to be much engaged in modeling and monitoring of water 
quality and to address this universal concern [2]. WQ can be described as a physical, chemical and biological 
characteristics of water which can be used to predict the water quality that aid in determining the extent of 
water purity. 

Water quality index (WQI) is applied worldwide to resolve the data management issues and assess 
success and failures in management strategies for improving WQ [3]. In order to determine the overall status 
of WQ the number of sensitive parameters need to critically be identify. Since no single variables can 
sufficiently assess the WQ, therefore, the WQ is generally assessed by computing the broad range of 
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parameters. As a result, large amount of data set is generated which requires to be presented in a meaningful 
way to decision makers, local planners and general public. In a view of this, WQI have been developed to 
convert the large data set in to a single index [4]. 

The reduction of water quality as a result of inadequate proper sanitation and pollutants coming 
from industries and the unreliability of most of the available mechanistic models in yielding promising 
forecasting results necessitated the vehement need for adopting others techniques and approaches [5]. 
Different methods have been used to measure and predict the quality of water in order to reduce the time 
consuming by collecting the data from the large data set and classify the quality using machine learning [3], 
but the main issues with machine learning method are high level of error susceptibility and acquisition 
relevant data set. Recently, a keen interest in studying the broad concept of artificial intelligence was 
developed, that communicate with the traditional model [6]. Despite several researchers such as [4-8] have 
used different neural network approaches in handling WQI. Nevertheless, most of the available models focus 
more on monitoring and analysis of water quality index. Therefore, this paper centres on estimating the water 
quality index through comparing the artificial intelligent approaches with conventional method applied to the 
Palla station along Yamuna River, India. 

Artificial Neural Network (ANN) is an Al-based approach that not only proved to be effective in 
handling large amount of dataset, complex nonlinear input and output relationship but also flexible and 
powerful computational tool [5, 9-10]. Adaptive Neuro-Fuzzy Inference System (ANFIS) as another AI- 
based model has found to be successful tool which incorporate the approach of fuzzy Sugeno model that 
derived the benefit of both ANN and fuzzy logic in a single system [11]. 

The performances of the models were evaluated using commonly used measures. The paper is 
organized as follows: section 2 describes the research method and section 3 presents results and discussion 
while section 4 gives the conclusion. 


2. RESEARCH METHOD 
2.1. Study area 

The biggest tributary of River Ganga is Yamuna River, this river is as sacred and prominent as the 
immense River Ganga itself. As the holy river, Yamuna covered 1,376 km, almost 57 million residents of 
North part of India rely upon it. A total catchment area of Yamuna is 366,223 km? which comprises of 42 
percent of the river ganga basin located in the territory of India. Delhi as capital territory received almost 70 
percent of its drinking water from Yamuna River while discharges almost 10,000 m/s yearly. But due to 
urbanization and inadequate water treatment plant, the River leaves Delhi as polluted water [12-13]. Figure 1 
shows the location of Palla station along Yamuna River basin in India. The daily WQ data were obtained 
from the CPCB for years 1999 to 2012. 
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Figure 1. Geographic location of the study Palla Station, Yamuna River India 
2.2. Modelling 


In this study, ANN, ANFIS and MLR models were proposed for the estimation of WQI of the river, 
data set were partitioned into two parts, 70% of the data were employed for calibration phase and the 30% of 
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the data for verification purposes. Selection of dominant inputs parameters is one of the important parts in 
any AI based modeling. The functional expressions for the WQI are presented in (1-5). MATLAB 9.3 
(R2017b) was used for the analysis of ANN and ANFIS while MLR model was developed using regression 
tool of EViews software 9.5 version. 


WOI, =y (DO) () 
WOI, =y (DO, pH) (2) 
WOI, =y (DO, pH, BOD) (3) 
WOI, =y (DO, pH, BOD, NH,) (4) 
WOI, =y (DO, pH, BOD, NH,,WT) (5) 


where WQI, depicts the water quality index, wy is the function of Dissolved oxygen (DO), pH, 
Biological Oxygen Demand (BOD), Ammonium nitrate (NH4), and Water temperature (WT). 


2.3. Multilinear regression analysis (MLR) 

Multi-linear regression analysis is the model applied based linear relationship between the 
dependent variable and independent variable. MLR is based on the concept of least squares, which is the 
value of the estimated parameter is expressed as a linear function [14]. As it is stated in (6). 


y=b, + bx, +b,x, +..,x, (6) 


where x,, is the value of the i® predictor, by is the regression constant, and b,is the coefficient 
of the i" predictor. 


2.4. Artificial neural network (ANN) 

ANNs are mathematical model aims to handle non-linear relationship of input — output dataset. 
Historically, are information processing tools derived from analogy with biological nervous system of brain. 
ANN has proved to be an effective tool in predicting nonlinear systems and quite capable of handling 
complex noisy data set [15-16], the prediction accuracy of ANN is high [17]. Back propagation (BP) 
algorithm is the most common used technique among the classification of ANN. In BP, each input training 
data flows via the system and passes to the output layer, the error of the training is generated and propagates 
backward until the desired target of the network is achieved [18]. The primary aim of BPNN is to reduce the 
error in order for the network to learn the training data. Sigmoid and the Lavenberg-Marquardt (LM) were 
used as activation function and algorithm, respectively. LM used in training MLP model because of its 
outstanding performance [19]. Before model training at the initial stage, the data for both input and output 
were normalized within a scale of 0 and 1 using the as: Figure 2 shows the structure of ANN. 


X= SS Piast (7) 
Xx 


max Xmin 


Where X ; 1s the normalized quantity, X, is un-normalized quantity, X,,, 18 the minimum and X,,, is the 


in 


maximum quantity of the data set 
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Figure 2. Structure of ANN that map a given inputs and output 


2.5. Adaptive neuro-fuzzy inference system (ANFIS) 

The combination of artificial neural network with the fuzzy system creates a robust hybrid system 
that is able to solve a complex nature of relationship. ANFIS as one of the AI models has the ability to 
overcome the limitations of fuzzy inference and ANN. ANFIS model combine the ability of ANN and Fuzzy 
logic to create a process that has the ability of handling complex non-linear interactions between a set of 
input and output [20]. The general structure of ANFIS is shown in Figure 3. For a typical ANFIS, assuming 
the FIS that contains two inputs ‘x’ and ‘y’ and one output ‘f*, a first order Sugeno fuzzy has following rule: 


Rule(1) : if u(x) is A, and u(y) is B, then f,=pxtqy+r 
Rule(2): if u(x) is A, and w(y) is B, then f,= p,x+qQytr 
Membership functions parameters for x and y inputs are A,, B,, Az, Bz, outlet functions’ parameters 


are Py, 41,71,P2, 42,12, a five-layer neural network arrangement followed the formulation and structure of 
ANFIS. For more explanation of ANFIS, refer to the study in [6]. 
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Figure 3. Geographic location of the study Palla Station, Yamuna River India 


2.6. Performance evaluation criteria 

The performance efficiency of the model can be assessed through different statistical measures, 
including Determination Coefficient (DC), Root Mean Square Error (RMSE), Mean Square Error (MSE) etc. 
Therefore, in order to evaluate the performance of ANN, ANFIS and MLR models, DC, RMSE and MSE 
were employed in this study [21]. The equation of DC and RMSE are given as: 
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>(wol,, -wal,,) 


DC =1- + (8) 
> (wel, -WOT,,) 
> (wor, -wol,,) 
RMSE =\| (9) 
N 
1 N 
MSE=— py (wal, -WaI,,) (10) 


where N, WOI,, WOl,,,WQI,, are data number, observed data, average value of the observed data and 


oi? 


computed values, respectively. DC ranges between and —oo and 1 with a perfect score of 1. 


3. RESULTS AND ANALYSIS 

In this paper, MLR, ANN and ANFIS were used to estimate the WQ at Pala station in Yamuna 
River, and their individual performance accuracy were compared. For all these models, MATLAB 9.3 
(R2017b) software was used for the analysis of ANN and ANFIS while MLR model was developed using 
regression tool of EViews software 9.5 version. For the estimation of river parameters, different input 
parameters have been employed, as appropriate input selection is essential [22]. Pearson and Spearman 
correlation analysis methods were performed to choose the inputs parameters. Five different models and 
input combinations were trained based on the number and types of input, for all the methods the model types 
were defined as MLR-I up to MLR-V, ANN-I up to ANN-V and ANFIS-I up to ANFIS-V indicating the type 
of models from one to five for MLR, FFNN, and ANFIS, respectively. 


3.1. Result of MLR model 

MLR model was applied as the classical conventional method for modeling the linear interactions of 
the system. It is often used as the reference comparison model with non-linear models. The equation (11) was 
obtained for the best model to estimate the performance of WQI, From Table 1, it indicates that the best 
performing mode was MLR-V which has a total of 5 input variables, the results indicate that the MLR model 
is best with the highest number of input variables. 


WOI = 0.8670 + 0.031D0 + 0.095 pH +0.3381BOD — 0.3961NH, —0.2471WT (11) 


The negative values in the estimation serve no purpose in the modeling of WQI. As shown in the 
Table 1, the MLR performance was satisfactory for the prediction of WQI at Palla. This is proved by the 
value of MSE=0.00131, DC=0.8919 and RMSE=0.03625 in the verification phase. Figure 4 present the 
scatter and times series plots for measured and estimated WQI values for MLR model in a verification phase. 
The measure and estimated values were well superposed and the discrepancies between the measured and 
estimated values were small which indicate high prediction accuracy. 


Table 1. MLR estimation results 


Stations Model Input Variables Calibration Validation 
MSE DC RMSE MSE DC RMSE 
MLR-I DO 0.00026 0.8970 0.0161 0.00143 0.8819 0.0378 
MLR-II DO, pH 0.00028 0.9071 0.0167 0.00135 0.8886 0.0367 
Palla MLR-III DO, pH, BOD 0.0003 0.8009 0.0172 0.0004 0.8651 0.02 


MLR-IV DO, pH, BOD, NH4, 0.00031 0.8912 0.0177 0.00133 0.8912 0.0365 
MLR-V__ DO, pH, BOD, NH4, WT 0.00029 (0.8960 0.017 ~—-0.00131 0.8919 0.03625 
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Figure 4. Evaluation of observed versus predicted scatter and time series plot of MLR model 


3.2. Result of ANN model 

In ANN-feed forward was trained by the algorithm called Lavenberg-Marquartd. ANN was trained 
with a sigmoid activation function which is non-linear exponential function. It’s paramount importance to 
make an appropriate selection of a hidden neurons and architecture of the network in order to prevent over- 
learning in the calibration stage. The result of ANN model is presented in Table 2. The prediction accuracy of 
ANN was superior than MLR model, the best model to estimate WQI was obtained to be ANN-II with the 
values of MSE, DC and RMSE are 9.0E-8, 0.9974 and 0.0003, respectively as shown in Table 2. Figure 5, 
shows the scatter and time series for measured and estimated WQI values for ANN model in a verification 
phase. From the comparison of Figure 4-5 it is clear that ANN are more fitted and the accuracy proved high 
merit over MLR model. This can also be justified by MSE between ANN and MLR models. The robustness 
of ANN could be attributed to the great advantage of ANN to handle complex and nonlinear system, unlike 
the MLR models which is base on the assumptions of linear input - output relationship. 


Table 2. ANN estimation results 


Station Model Model Structure Calibration Validation 
MSE DC RMSE MSE DC RMSE 
ANN-I d-1-1) 0.0000036 0.9957 0.0006 0.00000009 0.9946 0.00033 
ANN-II (2-2-1) 0.00000121 0.9976 0.0011 0.00000009 0.9974 0.0003 
Palla ANN-II (3-3-1) 0.00000676 0.9951 0.0026 0.00000009 0.9975 0.0003 
ANN-IV (4-4-1) 0.00001089 0.994 0.0033 0.00000036 0.9954 0.0006 
ANN-V (5-6-1) 0.00001024 0.9573 0.0032 0.00001156 0.9921 0.0034 
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Figure 5. Evaluation of observed versus predicted scatter and time series plot of ANN model 
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3.2. Result of ANFIS model 

ANHIS as a hybrid algorithm was employed with a suitable inference system called Takgi-Sugeno- 
Kang which worked based on several rules and membership function. In this study the ANFIS model consist 
of five input variables and one output variable in order to estimate the WQI at Palla station. ANFIS were 
trained base on five different models in each station, the triangular and gaussian membership function were 
tried to find the best model. For the purpose of this research, 5, (2, trimf, constant) indicates that, a model 
with 5 input variables, 2 triangular membership function input and constant output. Table 3 indicates that, the 
value of MSE, RMSE and DC are 8.41E-6, 0.0029 and 0.9909, respectively. ANFIS II was obtained to be the 
best model with two input combinations as shown in Table 3. Despite the superiority of ANN model over 
ANFIS model the performance accuracy of ANFIS model proved to be reliable in estimation of WQI of Palla 
station. Figure 6 shows scatter and time series plots for the measured and estimated WQI values for ANN 
model in a verification phase. From the Figure 6 it is clear that the ANFIS estimates were closer to the 
observed WQI value than MLR value. 

However, by comparing Figures 4-6 for MLR, ANN and ANFIS modes it is clear from the Figures, 
that ANN and ANFIS best model proved high improvement in performance accuracy over MLR up to 10% 
in the verification phase. The difference between ANN and ANFIS accuracy is negligible indicating that both 
the models outperformed MLR model interm of estimation accuracy. The results can also be justified by 
presenting the box plot of the best three models, in order to demonstrate how closely the estimations models 
are with the observed values, as illustrated in Figure 7. According to the Figure it is appear that ANN and 
ANFIS prediction values resembled the observed values. In comparing with MLR and ANFIS, ANN had the 
best fitting because the closer the data point to be best line of fit the better the predictions (see the scatter 
plots). Hence, ANN can be used as reliable and superior to ANFIS and MLR for the estimation of WQI. 


Table 3. ANFIS estimation results 


Station Model Model Structure Calibration Validation 
MSE DC RMSE MSE DC RMSE 
ANFSI-I 5, (2, trimf, Constant) 0.00000961 0.9078 0.0031 0.00001764 0.9179 0.0042 
ANFIS-II 4, (2, trimf, Constant) 0.00000676 0.9929 0.0026 0.00000841 0.9900 0.0029 
Palla ANFIS-II 3, (2, trimf, Constant) 0.00001521 0.9906 0.0039 0.00000961 0.9807 0.0031 
ANFIS-IV 2, (2, trimf, Constant) 0.00046225 0.9287 0.0215 0.0008 1225 0.9469 0.0285 
ANFIS-V 2, (2, trimf, Constant) 0.00001225 0.9557 0.0035 0.00001369 0.9384 0.0037 
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Figure 6. Evaluation of observed versus predicted scatter and time series plot of ANFIS model 


It can be seen from the Figure 7 that the ANN and ANFIS model outperformed the MLR model, rhe 
predictions of the intelligent models are extremely closer to the observed WQI. The intelligent models were 
highly accurate in estimating the WQI by achieving the DC close to unity. 
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Figure 7. Comparison of Box plots of the observed WQI and predicted of best models for ANN, 
ANF and MLR 


4. CONCLUSION 

The paper has presented MLR, ANN, and ANFIS models for estimation of Water Quality Index 
(WQD, with the water quality variables as inputs. The obtained results indicated that the artificial 
intelligence-based models (ANN and ANFIS) outperformed conventional model (MLR) up to 10% in the 
verification phase. The AI models were able to accurately follow the trajectories of the observed water 
quality index. Although the performance of ANN is slightly better than the ANFIS, but the ANN and ANFIS 
models outperformed MLR model in estimating the WQI. ANN and ANFIS models are more reliable in the 
estimation of WQI at Palla station of Yamuna River India. In order to increase the accuracy and uncertainties 
problems of the models and to explore the contribution of each input combinations, further research should 
be carried out by employing more AI based models in estimation of WQI. The intelligent models could serve 
as reliable and useful tools in estimating the water quality index of the river. 
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